Inductive matrix completion for predicting gene–disease associations
نویسندگان
چکیده
MOTIVATION Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. RESULTS Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. AVAILABILITY Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.
منابع مشابه
Disease Modeling via Large - Scale Network
Disease Modeling via Large-Scale Network Analysis Report Title A central goal of genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit problem of predicting the association of genes with phenotypes or traits. Our primary goal is to develop pragmatic data analytic methods for linking specific genes to traits and diseases, especially polygenic trai...
متن کاملProvable Inductive Matrix Completion
Consider a movie recommendation system where apart from the ratings information, side information such as user’s age or movie’s genre is also available. Unlike standard matrix completion, in this setting one should be able to predict inductively on new users/movies. In this paper, we study the problem of inductive matrix completion in the exact recovery setting. That is, we assume that the rati...
متن کاملRecommending Tumblr Blogs to Follow with Inductive Matrix Completion
In microblogging sites, recommending blogs (users) to follow is one of the core tasks for enhancing user experience. In this paper, we propose a novel inductive matrix completion based blog recommendation method to effectively utilize multiple rich sources of evidence such as the social network and the content as well as the activity data from users and blogs. Experiments on a large-scale real-...
متن کاملMCMDA: Matrix completion for MiRNA-disease association prediction
Nowadays, researchers have realized that microRNAs (miRNAs) are playing a significant role in many important biological processes and they are closely connected with various complex human diseases. However, since there are too many possible miRNA-disease associations to analyze, it remains difficult to predict the potential miRNAs related to human diseases without a systematic and effective met...
متن کاملGraph Matrix Completion in Presence of Outliers
Matrix completion problem has gathered a lot of attention in recent years. In the matrix completion problem, the goal is to recover a low-rank matrix from a subset of its entries. The graph matrix completion was introduced based on the fact that the relation between rows (or columns) of a matrix can be modeled as a graph structure. The graph matrix completion problem is formulated by adding the...
متن کامل